10. Assessing vs. Exploring
Assessing Vs Exploring V2
In the context of this dataset, assessing is everything you just identified, like spotting:
- Missing HbA1c changes
- Poorly formatted zip codes (e.g., four digits and float data type instead of five digits and string or object data type)
- Multiple state formats (e.g., NY and New York)
- Incorrect patient height values (e.g., 27 inches instead of 72 inches)
Assessing is also identifying structural (tidiness) issues that make analysis difficult.
The discovery of these data quality and ensure that the analysis can be executed, which for this clinical trial data includes calculated average patient metrics (e.g. age, weight, height, and BMI) and calculating the confidence interval for the difference in HbA1c change means between Novodra and Auralin patients.
Exploring , in the context of this dataset, might be:
-
Using summary statistics like
count
on the state column ormean
on the weight column to see if patients from certain states or of certain weights are more likely to have diabetes, which we can use to exclude certain patients from the analysis and make it less biased
Exploring, in the context of a clinical trial, is less likely to happen given that clinical trials are expensive and consist of extreme pre-planning. So exploring on this dataset would likely exclusively happen before the treatments and adverse_reactions tables were created, i.e., before the clinical trial was conducted.